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Abstract. We prove an inequality between the free entropy and the mutual free Fisher in- 
formation for two projections, regarded as a free analog of the logarithmic Sobolev inequality. 
The proof is based on the random matrix approximation procedure via the Grassmannian 
random matrix model of two projections. 



Introduction 



Among the most important notions in classical information theory is the mutual informa- 
tion formally expressed as 



for two random variables X, Y in terms of their Shannon-Gibbs entropies H(-). Motivated 
by the above expression, in part VI jTS] of his series of papers, Voiculescu introduced the 
notions of the mutual free Fisher information (p* and of the mutual free information i* via the 
liberation theory in free probability. These quantities are defined for subalgebras (rather than 
random variables) of a tracial V^*-probability space while the microstates free entropy x as 
well as the non-microstates x* ls f° r random variables. In the last section of [T^], Voiculescu 
explained that the formula like QU.ljl 



cannot be true in general, and he suggested the necessity of generalizing the free entropy to 
more general objects beyond self-adjoint variables and proposed, for instance, how to define 
the free entropy for projections. Note here that the free entropy x f° r self-adjoint variables 
has always — oo for projections. 

On the other hand, a large deviation principle recently obtained in |2] is related to a pair 
(P(N),Q(N)) of random projection matrices having independent and unitarily invariant dis- 
tribution provided limjv rank(P(./V"))/iV and lini/v ra.nk(Q(N))/N exist as constants. Indeed, 
the random pair (P(N), Q(N)) induces a random tracial state tn on C*(Z2*Z2), the universal 
C*-algebra generated by two projections, and the large deviation principle is concerned with 
the empirical measure of t^. An important fact there is that the rate function X(r) of tracial 
states r is equal to the minus of the free entropy Xproj(p, q) of two projection generators (p, q) 
in the GNS representation with respect to r. 

The main aim of this paper is to prove the inequality 



I(X, Y) = H{X) + H(Y) - H{X, Y) 



(0.1) 



x(X 1 ,...,X n ) + X (Yi,...,Y m ) 

= i*(W*(X 1} . . .,X n ),W*(Y 1 , ...,Y m ))+ x (Xx, ...,X n ,Y u .. . ,Y m ) 



(0.2) 




(0.3) 
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for two projections (p,q) under mild assumptions, where (p*(p : q) is the mutual free Fisher 
information of subalgebras Cp + C(l — p) and Cq + C(l — q). The proof is based on a 
random matrix approximation procedure derived from the large deviation principle mentioned 
above. In fact, the inequality (|U.3|) arises as a scaling limit of the classical logarithmic 
Sobolev inequality due to Bakry and Emery applied to a Grassmannian random matrix 
pair modeled on the pair (p,q). Thus we may consider as a kind of free probabilistic 
logarithmic Sobolev inequality. Such free analogs have been previously obtained in |2j for 
single self-adjoint variables and in [Jj for single unitary variables. After then a remarkable 
approach to such free analogs is given in JO] for single self-adjoint variables. 

The paper is organized as follows. First in §1, we briefly recall the definitions of the free 
entropy Xproj for projections and of the mutual free Fisher information (p*. For convenience 
of reference, the explicit forms of Xproj (p, q) in (Ej and of (f*(p : q) in ^1 for two projections 
are mentioned. §2 is devoted to the proof of (j0,3[) based on the classical logarithmic Sobolev 
inequality in via the Grassmannian random matrix approximation. Here we need the Ricci 
curvature tensor of the Grassmannian manifold to verify Bakry and Emery's r2-criterion. §3 
contains supplementary remarks. We note that — Xproj (p><?) appears as a scaling limit of 
the classical mutual information on the Grassmannian manifold; it seems natural because 

-Xproj (P,q) = Xproj (p) + Xproj (<?) ~ Xproj (P, <?) (due to Xproj (p) = Xproj (<?) = 0) has the 

form like (|U.1|) . From the viewpoint in ^3], this form also suggests that — Xproj (p,q) should 
coincide with the mutual free information i*(p,q). In fact, we make a heuristic computation 
to indicate that i*(p,q) = —Xproj(p,q), a very particular case of (|U.2j) . since Xproj (p) = for 
any single projection p. Finally, the equality ()0.3|) is slightly generalized into the relative 
version including a certain potential term. 

Acknowledgment. The authors thank Prof. Masaki Izumi for fruitful discussions. 



1. Preliminaries 

1.1. Free entropy for projections. For iV G N let U(iV) be the unitary group of order 
N. For k G {0,1,..., iV} let G(N,k) denote the set of all N x N orthogonal projection 
matrices of rank k, that is, G(N, k) is identified with the so-called Grassmannian manifold 
consisting of /j-dimensional subspaces in C N . Let -P/v(&) be the diagonal matrix with the first 
k diagonals 1 and the others 0. Each P G G(N, k) is diagonalized as 

P = UP N (k)U\ (1.1) 

where U G U(iV) is determined modulo U(fc)ffiU(iV — k). Hence G(N, k) is identified with the 
homogeneous space \J(N)/(XJ(k)®\J(N — k)), and the unitarily invariant probability measure 
on G(N, k) corresponds to the measure on U(-/V)/(U(/c) © U(iV — k)) induced from the Haar 
probability measure 7tj(at) on U(iV). We denote by 7G(jV,ifc) this unitarily invariant measure 
on G(N,k). Let (]y,k ■ U(iV) — > G(N,k) be the (surjective continuous) map defined by the 
equation (jl.lj) . i.e., (N,k(U) := UP^r(k)U*. Then the measure jG(N,k) i s more explicitly 
written as 

lG(N,k) = 7U(A0 ° Cjv,V ( L2 ) 

Let (pi, . . . ,p n ) be an n-tuple of projections in a tracial VF*-probability space (M., r) with 
ctj := r(pi), 1 < i < n. Following Voiculescu's proposal in |15[ 14.2] we define the, free entropy 
Xproj(pi) • • • iPn) of (pi, • • • ,p n ) as follows: Choose k(N, i) £ {0, 1, ... , A^} for each N G N 
and 1 < i < n in such a way that k(N, i) /N — > aj as N — > oo for 1 < i < n. For each m G N 
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and e > set 

r P roj(pi, • • • ,Pn) k( N i !)j • • • n);N,m,e) 

:= I (P 1 ,...,P n ) e Y[G(N,k(N,i)) : |tr JV (P il ---Pi r )-r(p il ---p i 



< e 



for all 1 < ii, . . . , i r < n, 1 < r < m 

where tr^v stands for the normalized trace on the N x N matrices. We then define 
Xproj (pi , ■ ■ • , Pn) ■ = inf lim sup 

mGN,e>0 AT^oo 



^logj (g)7G(JV,fe(7V,i)) J {T(p 1 ,...,p n ;k(N,i),...,k(N,n);N,m,e)). (1.3) 



It is easy to see that the above definition of Xproj (pi, • • • ,Pn) is independent of the choices 
of k(N,i) with k(N,i)/N — > Qj for 1 < i < n. The free entropy Xproj for projections has 
properties similar to those for self-adjoint variables developed in |121 1131 HI] and for unitary 
variables in [5J Chapter 6] , which we will discuss elsewhere jS] . It is obvious that Xproj (p) = 
for any single projection p. 

In this paper we are concerned with the free entropy of two projections. Let (p, q) be a 
pair of projections in (.M,t) with a := r(p) and := r(q). Set 

E u :=pAq, E 10 := p A q L , -Em := P ± A q, E 00 := p L A q L , 

E := 1 — (Ebo + -Eoi + -Eio + -En) 

and ctij := T(-Ejj) for i,j = 0,1. Then E 1 and Eij are in the center of J\f := {p, g}" and 
(EAfE, t\eme) is isomorphic to L°°((0, 1), i/; M 2 (C)), the L°°-algebra of M 2 (C)-valued func- 
tions, where ^ is a measure on (0,1) with z/((0, 1)) = 1 — Ylij=o a ij- Here EpE and EV/E 1 
correspond to 

1 0" 



te (0,1) 







and 



respectively, and t\eme is represented as 



r(o) 



tr 2 (o(t))di/(t) 



;i-4) 



for a € EME corresponding to a(-) G L°°((0, 1), z^; M 2 (C)). In this way, the mixed moments 
of (p,q) with respect to r are determined by the data (i/, {oy-J-Jj—o). Although is not 
necessarily a probability measure, we define the free entropy by 



1 /•! 



JO 



log |x — y\ dv{x) dv{y) 



in the same fashion as in |T5]. Furthermore, we set 



p := min{a, 0, 1 - a, 1 - /?} = - ( 1 - ^ ay j , 



C := p 2 B 



\a-(3\ \a + (3-l\ 



;i.5) 
;i.6) 
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(1 + s) 2 S 2 (1+t) 2 t 2 

B(s,t) := V 2 1 log(l + a )--log 8 + ^-^-Iog(l + t)--Iogt 



(meant zero if p = 0), where 

' 2 , 
-log(l 

(2 + s + t) 2 , , . (1 + s + t) 2 , 
2 Ll og(2 + « + + A 2^ Mog(l + s + i) 

for s, t > 0. With these definitions, the following expression was obtained in [Sj as a conse- 
quence of the large deviation principle for an independent pair of random projection matrices. 

Proposition 1.1. ( 6,, Theorem 3.2, Proposition 3.3]) 7/aooan = aoi«io = 0; then 

H ^ / log(l - x) dz^(x) - C, 

anc? otherwise Xproj(p,q) = — oo. 

It is known 6 that limsup in definition (|1.3|) can be replaced by lim in the case of two 
projections. Furthermore, it was shown there that Xproj(P)<?) = if and only if p and q are 
free. (Note that this fact still remains valid even for general n tuples of projections, whose 
proof will be given in [Sj.) Note that the condition aoo«n = «oiaio = is equivalent to 

an = max{a + /3 — 1, 0}, 
a o = max{l - a- /3, 0}, 
aio = max{a — /3, 0}, 
aoi = max{/3 — a, 0}; 

in this case, aoi + aio = \& — P\ an d aoo + an = |a + (3 — 1|. 

1.2. Mutual free Fisher information. Let .4 (3 1) and ,8 (9 1) be two *-subalgebras in 
(M.,t), which are assumed to be algebraically free. Let A V B and W*(A U B) denote the 
subalgebra and the von Neumann subalgebra, respectively, generated by A U B. Let 5ab be 
the derivation from A V B into the A V £>-bimodule (A V B) (g> (.4 V B) uniquely determined by 

{^(a) = a <g) 1 — 1 (g) a for a G .A, 
<U*(&) = for 6 G B. 

If there is an element f 6 ^(^(iUB)) such that 

r(£x) = (r ®r)(^ :iB (x)), iGiVfi, 

then £ is called the liberation gradient of («4, B) and denoted by j(A : B). Voiculescu ^5] 
introduced the mutual free Fisher information of *4 relative to B by 

:= \\j(A:B)\\ 2 

(|| • ||2 stands for the L 2 -norm with respect to r) if j(A : B) exists in L 2 (W*(AuB)); otherwise 
<p*(A : B) := +00. See f° r more about the mutual free Fisher information. 

Let (p, q) be a pair of projections in (A4,t) and set A := Cp + C(l —p), B := Cg + C(l — q). 
Then the liberation gradient j(A : B) and the mutual free Fisher information tp*(A : B) were 
computed in ^]. Here, recall that the Hilbert transform of a function / with f(x)/(l + \x\) G 
L 1 (M, dx) is defined to be 

/(*) 



(Hf)(x) := lun(H £ f)(x) with (H e f){x) := [ 
e\0 J\ 3 



\x— t\>e x t 



dt 
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(whenever the limit exists almost everywhere). The following is a slightly improved version 
of [EH Proposition 12.7]: 

Proposition 1.2. With the same notations as in §§i.i assume that aoo^n = ojoi^io = 0, v 
has the density f := du/dx 6 L 3 ((Q, l),x(l — x)dx) and moreover 

f + f(x) dx < +oo. (1.8) 

Define X := pqp + (1 — p)(l — q)(l — p) and 

if \ (vf\( \ , Q oi + "io aoo + «ii , n ^ 

<p(x) := (Hj)lx) H /or < x < 1. 

x 1 — x 



Then 

and hence 



j(A : B) = [q,p}<p(EXE) £ L 2 (M,t 



f 

Jo 



cp*(A:B)= 4>{xYf{x)x{\ -x)dx< +oo. 



The assumption (|1.8|) can be reduced when a = 0ora + = l thanks to (|1.7I) . In fact, 
(|1.8|) is nothing when a = (3 = 1/2; it means x~ 1 f(x)dx < +oo when a + = 1 but 
a / /J. All the assumptions of Proposition 11.21 are satisfied in particular when p and g are 
free (see Example 3.6.7]). Note Propositions 5.17 and 9.3.c] that j(A : B) = (or 
equivalently 99* (A : B) = 0) if and only if p and q are free. 

In |151 §12] the support of v was assumed to be an infinite set to guarantee that A and B 
are algebraically free. As long as v 7^ 0, that is automatically satisfied from the assumption 
of v having the density. In the case where v = so that p = by (|l,5j) . it follows that 
p G {0, 1} or g G {0, 1}; hence Proposition 11.21 trivially holds. 

Proof of Proposition 1.2. We first remark a weighted norm version of so-called M. Riesz's 
theorem for Hilbert transform. Since Jq(x(1 — x))~ l l 2 dx < +00, the celebrated weighted 
norm inequality for Hilbert transform 9, Theorem 8] shows that there is a constant C w > 
depending only on the weight function w(x) := l[ 0] i](x)x(l — x) such that for every function 
g (whose Hg can be defined) 

\\Hg\\ w , 3 < C w \\g\\ w>3 (1.9) 

with the weighted norm 

\\g\\w,p-=(J \g(x)\ p x(l - x)dx^j forl<p<oo, 

and moreover \\H e g — Hg\\ w ^ — ► as e \ whenever 3 < +00. In what follows we 
use the same symbols as in §12] with small exception; p, q, A,B and x,x\,X2 are used 
instead of P, Q, A,B and t,t\,t2, respectively. By the facts mentioned above one easily has 



„n+l _ n+1 n 
L l x 2 x l 



JO \ Xi- X 2 Xl — X 2 



dv(x\) dv{x2) 



Aim [[ (x^xiil - x l )^^-f(x 1 ) 

s \°JJ\xi-x 2 \>e\ Xi-X 2 

+ x^" 1 x 2 (l - x 2 ) f(x 2 ) ) dx x dx 2 

x 2 - Xl 

-2 Mm. [ x n ~ l (H £ f)(x)f{x)x{l- x)dx 

e\0 Jo 
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= -2 f x n - 1 (Hf)(x)f(x)x(l-x)dx. 
Jo 

Hence, the assertion of |15l Lemma 12.6] can be changed to 

1 



!(r r)o.W!(Wl ,r )-~ / x n ~ 1 (H f)U )/(.,■),■( i -,-)<l.r 



o 



+ f 1 x n~l (x _ 1) Mx) + a °° + Ql1 x n-l Mx) (1. 10 ) 

/o 2 j 



2 



under the assumptions of Proposition 11.21 The rest of the proof goes along the same line as 
15, Proposition 12.7] with replacing 1">. Lemma 12.6] by (|1.1U|) . □ 

2. An inequality 

Our aim of this section is to obtain the following inequality between the free entropy 
Xproj and the mutual free Fisher information ip* for a pair (p, q) of projections in a Im- 
probability space (A4 , r) . For simplicity we hereafter write (p* (p : q) for the mutual free 
Fisher information ip*(Cp + C(l — p) : Cq + C(l — q)) (see §§1.2). 

Theorem 2.1. With the same assumptions as stated in Provosition \l.°A 

-X P voj{p,q) <<p*(p--q)- 

The main idea of the proof is a random matrix approximation procedure based on the large 
deviation shown in [JJ]. In fact, we apply Bakry and Emery's logarithmic Sobolev inequality 
in to random projection matrix pairs (or probability measures on the product of two 
Grassmannian manifolds) and pass to the scaling limit as the matrix size goes to oo. Thus 
our inequality can be regarded as a kind of free probability counterpart of the logarithmic 
Sobolev inequality. A further discussion on this aspect will be given in the next section. 

So-called Bakry and Emery's r2-criterion is crucial in their logarithmic Sobolev inequality 
in the Riemannian manifold setting and the Ricci curvature tensor is one of the important 
ingredients of the criterion. We thus need to compute the Ricci curvature tensor Ric(G(iV, k)) 
of G(N,k), 1 < k < N — 1, as described below. Let u(iV) be the Lie algebra of U(iV) and 
regard \j(N, k) := u(A;) ©u(A^ — k) as a Lie subalgebra of u(iV). The tangent space TpG(N, k) 
at each P 6 G(N,k) can be identified with g(N,k) := f)(iV, k)^, the orthocomplement of 
t)(N,k) in u(A^) with respect to the Riemannian metric (X,Y) := ReTr^v^^*), where Tr^r 
is the usual trace on N x iV matrices. Choose the following complete orthonormal system of 
S (N,k): 

with 1 < i < k, k + 1 < j < N. According to well-known facts on compact matrix groups and 
O'Neill's formula (see HJ Proposition 3.17, Theorem 3.61] for example), the Ricci curvature 
tensor of G(N, k) with respect to the above-mentioned Riemannian metric is computed as 
follows: 

Ric(G(N,k)) P (X,X) 

52 (ll^^ll^ + nix,^]!!^), Xeg(N,k). 

l<i<k,k+l<j<N 

A simple direct computation shows that the above right-hand side is iVHAH^g so that 

mc(G(N,k))=NI 2k{N _ k) . (2.2) 
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Proof of Theorem 2.1. Let a, (3 and (is, {ctij}} -_ ) be as in §§1.1 for the given pair (p, q) of 
projections. Since the inequality trivially holds if v = 0, assume 7^ and let v\ := ^(l) -1 ^, 
the normalization of v. In addition to the assumptions of Proposition 1 1 1 21 we first assume the 
following (A) and (B): 

(A) v is supported in [8, 1 — 8] for some 8 > and it has the continuous density du/dx. 

(B) The function 

Q Ul (x) := 2 / log |x - y\ dvx(y) 



is a well-defined C 1 -function on [0, 1]. 
Choose C 1 -functions ho(x) and h\(x) on [0, 1] such that 

J = logo: {8<x<l), J=log(l-x) (0<x<l-8), 

h>o(x) < , v mix) < 

\>logx (0<x<5), u; [>log(l-x) (l-tf<z<l). 

For each JVeN choose &(A), i(JV) G {1, . . . , N- 1} such that fc(JV) /iV — > a and Z(A) /N —> (3 
as A — > 00, and set 

n (A) := A - mm{k(N),l(N)} 1 
m(N) :=max{£;(A) + /(A) - A,0}, 

n(AT) := min{£;(A), Z(A), A - k(N),N - l(N)} = A - n (A) - n x (A). 
For each n G N letting 

^Pn(x) := -Jf-Qm (?) + N M*0 

\k(N) + l(N) — ATI , . . 
+ ' 1 7 ^ ^ 0<x<l, 

we define a probability measure (regarded as a pair of A x A random projection matrices) 
Aj" on G(N, k(N)) x G{N,l(N)) by 

d\% N (P,Q) := -^e W (-NTr N (^ N (PQP))) d(<y G{NMN)) ® 7G W( jv))) (P, Q) (2.3) 

with the normalization constant as well as the reference measure A^ := r yc(N,k(N)) ® 
lG(N,l(N))- When (P,Q) G G(N,k(N)) x G(A,Z(A)) is distributed under A^, it is known [HI 
(2.1)] that the eigenvalues of PQP are 

no (AT) times ni(iV) times 

and the joint distribution of (xi, . . . , x n m\) is 

n(JV) 

dX^, . . . ,z n(JV) ) := ^ II 4 fc(JV) ~ KiV)l (l " x,)!*^^-^ 

Z N i=\ 

n(N) 

x n ~ x j) 2 n i [o,i]( x i) dx * ( 2 - 5 ) 

l<i<j<n(N) i=l 

with the normalization constant Zjy. Hence it turns out that when (P,Q) G G(N,k(N)) x 
G(N,l(N)) is distributed under A^, the eigenvalues of PQP are listed as in (|2.4|) but the 
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joint distribution of (x\, . . . , x n m\) is changed to 



/ n(N) 

— exp - J2M N )QvM) + \KN) - l{N)\(h ( Xi ) - logx, 



L N \ i=l 



+\k(N) + l(N) - NKhtixi) - log(l - x 4 ))}J 

n(N) 

x I I {xi- Xj) 2 \\_l [i3A {xi)dxi (2.6) 

l<i<j<n(N) i=l 



with another normalization constant Z N 



N ■ 

Similarly to j6j Proposition 2.1] and §5.5] we have 

(a) The limit C := lim7v-+oo 777 log Zfc N exists as well as C = liniAr^oo log Z Q N (see 

(b) When (x\, . . . ,x n rm) is distributed under A^, the empirical measure 



H h $ 



x n(iV) 



n(JV) 

satisfies the large deviation principle in the scale 1/N 2 with the rate function 

: = -p 2 S(/x) + p 2 / F(a:) dp(x) + C" for // G M([0, 1]), 
J 

where .M([0, 1]) is the set of probability measures on [0, 1], p is given in ()1.5j) and 

F(x) := Q^(x) + ^—^-(h Q (x) - logx) + |a + /j ~V lO*0 " log(l - x)) 
P P 

for < x < 1. 

(c) 1^1 is a unique minimizer of / with = 0. 

The last assertion follows from ^2 1-1-3 and 1.3.1] because by the construction of ho and h\ 
we get 

= F{x) if x G [5, 1 — 5] D supp z/i, 
< F(x) for x G [0,1]. 

Furthermore, the above large deviation yields: 

(d) The mean eigenvalue distribution 

\1pN f ^ Xl ~^ '"^n(iV) ,\lP N , \ 

-\!V •- / d\ N (Xi, . . . ,X n (jv)j 

weakly converges to 1^1 as iV — ► 00. 
Since the Riemannian manifold G(N, k(N)) x G(N, l(N)) has the volume measure Ajy and 
its Ricci curvature tensor is NI 2 k(N-k)+2i(N~i) by (|2.2|) . the classical logarithmic Sobolev 
inequality due to Bakry and Emery implies that 

2 



G(N,k(N))xG(N,l(N)) 



dXli 
Vlog N 



dX% 



d\% N , (2.7) 
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where the left-hand side is the relative entropy of Xff with respect to and the gradient 

V \og{dX% N /dX° N ){P, Q) is considered in q(N, k(N))(Bg(N 1 l(N)) via the natural identification 
T (PiQ) G(N,k(N)) x G(N,l(N)) = q(N, k(N)) q(N,1(N)). By and (JTSJ) notice that 



dX 



N 



dX° N 



(P,Q) 



z 



l z° ( n(N) N 

exp(-NTr N ^ N (PQP))) = ^exp [-N ]T <M^) 



N 



z 



4>N 

N 



8=1 



for (P, Q) G G(iV, k(N)) x G(iV, /(iV)) and for the eigenvalues (x x , . . . , x n{N) ) of PQP except 
uq(N) zeros and ni(iV) ones (see (|2.4j) ). Hence we get 



flX^ N 

log— ^-(p,Q)d4r(p,Q) 

G(N,k(N))xG(N,l(N)) a * N 



io g i° r -iog^ + 



-N^Mxi) dX% N (x U .... 



X 



i=l 



'[0,l] n(JV) 

= log log - Nn(N) I <p N (x)dX%»(x). 

Jo 

Since iPn(x) converges to 

pQ Vl {x) + \a — P\h (x) + \a + P — l\h\(x) 
uniformly on [0, 1], it follows from (a) and (d) above that 

= C-C'-p / (pQ Ul (x) + \a- (3\h (x) + \a + /3-l\hx(x))dvi(x). 
Jo 

Since (c) gives 

-C = -p 2 S(i/i) + p 2 / F(x)^ 1 (x) = - / o 2 S(i/ 1 )+p 2 / Q^x^Or), 
io io 



n(2V), 



we have 



= C - j0 2 S(^i) - p / (|a - /3|/to(^) + | a + /3 - l|/ii(x)) dz^i(x) 
Jo 

thanks to Proposition II . II and (jl.7JI together with v\ = (2p)~ 1 u (see (3.4)]). 



On the other hand, since 

Vlog^(P,Q) = -NV(Tr N ty N (PQP))), 



dX% N 



one can compute 



Vlog 



dX 



iV 



HS 



4N 2 Tr N (W N {PQP)) 2 PQP{I - PQP)) , 



whose short proof will be given as Lemma 12.21 below for completeness. Therefore, 



G{N,k(N))xG(N,l(N)) 



Vlog 



dX 



N 



dX% 



(P,Q) 



dX% N (P,Q) 



HS 



(2.1 
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n(N) 



AN 2 ! ^{^ N {x i )) 2 x i {X-x i )d\^{x u ...,x <in ) 
AN 2 n{N) f\i;' N {x)) 2 x(l-x)dX% N (x) 



4n(N) I {n(N)Q' ul (x) + \k(N)-l(N)\h' (x) 

+\k(N) + l(N) - l\h[(x)) 2 x(l -x)dX% N (x), 



and thus by (d) we have 
lim 



AT-foo 2iV 3 Jg{NMN))xG(N,1(N)) 



V log ■ N 



d\% 



HS 



2p / {pQ' ul {x) + \a- l3\h' Q {x) + \a + (5-l\h' l {x))x{l-x)dvi{x) 
Jo 

(pQ' vl {x) + ^ - ^ - — Y^-^^j x{l - x) du{x) 

= <p*(p:q) (2.9) 

thanks to Proposition 11.21 since v\ = (2p)~ 1 u so that pQ'{x) = (H f)(x) for / := dvjdx. 
Combining (|2.7|) - (|2.9|) yields the desired inequality under the additional assumptions (A) and 
(B). 

Next, let us remove (A) and (B). First, suppose that the assumption (A) is still satisfied 
but (B) is not. For each e > choose a non-negative C°°-function i\> £ supported in [— e,e] 
with J tp £ (x) dx = 1. Let f £ := f * ip £ for / := dvjdx and define dv £ {x) := f e (x)dx; 
then v £ is a measure supported in a closed proper subinterval of (0,1) with i/ e ((0, 1)) = 
1 — J2i j=o a ij whenever e is small enough. Let (p £ , q £ ) be a pair of projections in some (j\A, r) 
corresponding to the representing data (u £ , {aij}} j =0 )- (Such a pair can be constructed via 
the GNS representation of the universal C*-algebra C*(Z2 *1*2) with respect to the tracial 
state corresponding to (v £ ,{a>ij}}j =0 )] see §3] and also §§3.3.) Since (A) and (B) are 
satisfied for u £ , we get 

-Xproj(Pe,g £ ) < <P*(Pe ■ ?s)- 

Since \\f £ — f\\ w> 3 — > and \\Hf e — Hf\\ W) 3 — > as e \ (see the proof of Proposition 1.2 for 
the weighted norm || • [|^ 3), the Holder inequality together with (jl.9j) implies that 

[\(Hf £ )(x)) 2 f £ (x)x(l-x)dx^ [\(Hf)(x)) 2 f(x)x(l-x)dx (2.10) 
Jo Jo 

and hence 

lim (p*(p £ : q £ ) =tp*(p:q). 

e\0 

Since £(//) for p £ A / i((0, 1)) is weakly upper semicontinuous (see 5.3.2]), we also have 

-X P ro]{p,q) < liminf(-x P roj(p e ,fe)) 

so that -Xproj(p,q) <^*{p-q)- 

Finally, suppose only the assumptions stated in Proposition 11.21 For 5 > set 
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and let ipg,qs) be a pair of projections corresponding to (i/g, {aij}jj =0 ). Let us denote 
the density of vg by fg; then it is immediate to see that \\fg — f\\ W) 3 — > 0. To show that 
<P*{ps '■ Qs) — : q) as (5 \ 0, it suffices to prove the following convergences as 5 \ 0: 

l f i 

((Hfs)(x)) 2 f s (x)x(l-x)dx^ I ((Hf)(x)) 2 f(x)x(l-x)dx, (2.11) 

(2.12) 



JO 



[ (Hf 5 ){x)x~ 1 fs(x)x(l-x)dx — ► [ (Hf)(x)x- 1 f(x)x(l-x)dx, 
Jo Jo 



1 pi 

(Hf 5 )(x)(l-x)- 1 fs(x)x(l-x)dx^ / (Hf)(x)(l-x)- 1 f(x)x(l-x)dx, (2.13) 



JO 

1 rl 

2 



x fg{x) x(l - x) dx — ► / x A f{x)x(l -x)dx, ( 2 -14) 
o Jo 

1 pi 

{l-x)- 2 f s {x)x(l-x)dx — > (l-x)- 2 f(x)x(l-x)dx. (2.15) 
10 Jo 

Remark here that (|2.12|) and (|2.14l) are unnecessary when aoi + «io = |« — P\ = 0j and so are 
((2~T3l) and (f2~T51) when a n + a 00 = \a + (3 - 1| =0. The convergence (j2.11|) follows as (j2.1(J|) 
above. Also, ()2.14|) and ()2.15|) immediately follow from the hypothesis ()1.8|) . Since (|2.12l) 
and (|2.13j) are similarly shown, let us prove only the former here. Thus, we should assume 
a/ft and (|1.8|) means f x~ l f(x) dx < +oo. By the Holder inequality together with (|1.9I) 
one can estimate 

\\(Hf s )x- 1 fs - (Hftx- 1 /^ < \\H(f s - f)\\ w>3 ■ Wx^fl' 2 ]^ • ||/, 1/2 |U )6 

+ \\Hf\\ w>3 • \\x~ l \h ~ /I 1/2 |U,2 • || |/ 5 - /| 1/2 |ke 
<C w \\h-f\\ w ^-\\x- l fl ,2 \\ wa -\\fs\\]il 

+ c w \\f\\ w , 3 ■ wx- 1 ^ - ■ \\fs - fife. 

Note that 

\\ x ~ l f't ) < / x~ 1 fg{x)dx — ► / x~ 1 f(x)dx, 

Jo Jo 

Wx-'lfs ~ f\ l,2 \\l,2 < I x-^fsix) - f(x)\ dx — > 



as 8 \ 0, where x 1 f(x)dx < +oo is essential. These apparently imply ()2.12|) thanks to 
/ G L 3 ((0, - x)dx) and \\f s - f\\ w>3 — 0. 

Moreover, since — logx < near and — log(l — x) < (1 — x)" 1 near 1, the hypothesis 
(|1.8|) implies that 

/ (— log x)fs(x) dx — > / (— logx)f(x) dx < +oo, 
io ./o 



(— log(l — x))fg(x) dx — > / (— log(l — x))f(x) dx < +oo 

./o 

as 5 \ (whenever those are needed) so that 

-Xproj(P,<?) < liminf(-x P roj(P5,^))- 
Hence the proof is completed. □ 
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Lemma 2.2. Let ip be a C -function on [0,1] and define *&(P,Q) := Ttn{iP{PQP)) for 
(P,Q) £G(N,k)xG(N,l). Then 

|| V*(P,Q) = 4Tt N (W(PQP)) 2 PQP(I - PQP)) 

holds for every (P, Q) £ G(N, k) x G(N, I). 

Proof. Write (X r ) 2 r ^ k ^ for the orthonormal basis of q(N, k) given in (J2.1|) and also (Y s )fj^ l ' 
for that of g(N,l). For each (P,Q) = (UP N (k)U* ,VP N (l)V*) in G(N,k) x G(N,l), a local 
normal coordinate at (P, Q) is given by the mapping 

(X,Y)eg(N,k)® Q (N,l) 

^ (Ue x P N (k)e- x U*, Ve Y P N (l)e- Y V*) G G(N, k) x G(N, I). (2.16) 

By a direct computation using this coordinate, one can compute 

W(P,Q) = Y,( u *Q p f'( p Q p ) PU ~ U*Pf'(PQP)PQU,X r )X r 

r 

+ ^2(V*Pf'(PQP)PQV - V*QPf(PQP)PV,Y s )Y s 

s 

so that 

\\V^(P,Q)f HS = 2\\P k (N)U*Pf'(PQP)PQU(I - P k (N))f HS 
+ 2\\P l (N)V*QPf'(PQP)PV(I - Pi(N))\\ 2 HS 
= 4Tr N (if (PQP)) 2 PQP (I - PQP)). 



□ 



3. Supplementary remarks 



3.1. Classical vs free probabilistic mutual information. The classical mutual informa- 
tion of two random variables, say X, Y, is usually formulated to be 

f d/Jj(x Y) 

I(X,Y) := S(n( X ,Y),VX ®Hy) = / log-77 7k \( x ^) d ^(x,Y){x,y), 

Jxxx d(nx ®My) 

where fj,x,fJ>Y are the distribution measures of X, Y on the phase space X and ^(x,Y) the 
joint distribution of (X, Y) on X x X. The mutual information is in turn written as 1)0. 1|) 
in Introduction as long as all the involved quantities (the Shannon-Gibbs entropies of X, 
Y and (X, Y)) are finite. This is nothing but Voiculescu mentioned in ^H] as an initial 
motivation of his introduction of the liberation theory in free probability. Let us now apply 
the definition of I(X, Y) to our random matrix model of a given pair (p, q) of projections. A 
random matrix model at our disposal is the Grassmannian random matrix pair (P(N), Q(N)) 
whose joint distribution on G(N, k(N)) x G(N,l(N)) is the measure X^ N given in ()2,3)) . By 
the unitary invariance of trace functions and the measure 1)1.2)1 . it is plain to see that the 
marginal measures of Xff are jG(N,k(N)) an( ^ lG(N,l(N))'> thus 

I(P(N),Q(N)) = S(X% n ,X°n) with X% = 7 G{NM N))®JGW{N))- 
In the proof of Theorem 12.11 we obtained (see 1)2.8)1 ) 

-Xp<oi(p,q)=^^S{\%»,\%)=^±I(P(N),Q(N)). 

Hence the minus free entropy of two projections can be also obtained as a scaling limit of 
classical mutual information of our random matrix model. According to )15j a "heuristic 
definition" of free mutual information i*(p,q) should be "Xproj(p) + Xproj(q) — Xproj(j?, <?)" on 
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the analogy of classical theory. However, the actual definition of free mutual information is 
completely different and based on the so-called liberation process so that it may be partic- 
ularly interesting to examine whether or not i*(p,q) coincides with — Xproj (p> in view of 
Xproj(p) = Xproj(^) = 0. In fact, our inequality in Theorem l2. H is a kind of logarithmic Sobolev 
inequality and its right-hand side (the Dirichlet form part) is a "derivative" of i*(p, q), which 
also gives us a strong reason to do so. Let us give a heuristic argument in the next subsection. 

3.2. i* = — Xproj for two projections. Although there are still some difficulties on regularity 
via the liberation process, we give a heuristic computation indicating that i*(p,q) coincides 
with -X P ro](p,q)- 

For a pair (p,q) of projections in a tracial VF*-probability space (Ai,r), let (v, {aij}j J=0 ) 
be its representing data consisting of a (not necessarily probability) measure v on (0, 1) and 
the trace values of four projections Eij (see §§1.1). Let (u(t)) t >o be a unitary free Brownian 
motion starting at u(0) = 1 (see j^j). Letting p(t) := u(t)pu(t)*, a liberation process of 
projections starting at p, we write v t and Eij(t) for v and E^ corresponding to (p(t), q). Now, 
assume that {y, {ocij}jj =0 ) satisfies the assumptions of Proposition IT~2l By ^1 Corollary 8.6] 
the liberation gradient Jt := j(Cp(t)+C(l—p(t)) : Cq+C(l — q)) exists; hence r{Eij{t)) = ctij 
for all t > and i,j = 0,1 thanks to ^3 Lemma 12.5] together with r(p(t)) = r{p). It is 
quite plausible that each vt has the same properties as u, i.e., the assumptions of Proposition 
11.21 However, we could not derive these from the assumptions of v so that they have to be 
assumed here. Furthermore, we suppose that 

• ft{x) is smooth in (t,x) £ (0,+oo) x (0,1), 

which is also plausible to hold true. Set 



where H ft is the Hilbert transform of ft- As stated in Proposition ll.2l the liberation gradient 
Jt and the mutual free Fisher information <p*(p(t) : q) := ip*(Cp(t)+C(l—p(t)) : Cq+C(l — q)) 
are given by 



For each m G N and t,e > we set q(t) := u(e)u(t + e)*qu{t + e)u[e)* and J[ := j(Cp + 
C(l -p) : Cq(t) +C(1 - q(t)). Note that u(e) is *-free from {p,q(t)} and that (p,q(t)) 
has the same distribution as (p, u(t)*qu(t)), which is clearly the same as (p(t),q) too. Hence 
(p> J't) an d (p(t),q, Jt) behave in the same way under r. Therefore, by ^3 Corollary 5.7] 
we have 



E{t) := 1 - (Eoo(t) + E 01 {t) + E 10 (t) + E 11 (t)), 
X(t) := p(t)qp{t) + (1 - p(t))q(l - p(t)), 




Jt = [q,p(t)]ME(t)X(t)E(t)) 




r{{p(t + e)qp(t + e)) m ) 



T({pu(t + efqu(t + £)) m ) 
T{{u{e)pu{eTq{t)T) 

r ((pq(t)) m ) + {[Jl,p] mpqit))™- 1 ) + 0(e 2 ) 
r ({p(t)qp(t)) m ) + {[J t ,p(t)] {qp^qT' 1 ) + 0(e 2 ) 
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so that 

T((p(t + e)qp(t + e)) m ) 



d_ 

de 



e=0 

777 

= -r([J t ,p(t)}(QP(t)qY 



= -r([[ q ,p(t)},p(t)}( q p(t) q r- 1 ME(t)X(t)E(t)) 

= mr ((p(t) q p(t)) m ~ ( P (t)qp(t)) m+1 <t> t {E{t){p{t)qp{t))E{t))) 

because both E(t) and X(t) are in the center of {p(t),q}". In view of the assumption on 
(t, x) i— > ft(x) the above equation implies that 



m / {x m - x m+l )4)t{x)ft{x)dx 



d 

x m — {x(l - x)(j)t{x)ft{x)) dx, 
ox 

which yields 

Wt ft{x) = -foW ~ x)Mx)ft(x)). (3.1) 

Proposition 11.11 savs that 
X P roj(p(*),<?) 

i f 1 r 1 

log \x-y\- ft(x)f t {y) dx dy 



'0 Jo 

«oi + "10 



/ logx • f t (x) dx + a °° Ql1 / log(l - x) ■ ft(x) dx - C. 
Jo 1 Jo 



Differentiating the above and applying ()3.1j) one can perform a heuristic computation as 
follows: 

d 1 f 1 f 1 d 

jj.Xpvoj{p(t),q) = -- \og\x-y\ • — (x(l- x)(/>t( x )ft{x))ft(y)dxdy 

' ^ logx • -^-{x(l - x)4> t {x) f t {x)) dx 



2 ./n ° dx 

f) 

x(l - x)4> t {x)ft(x))dx 



aoo + cm f n M -,9 
/ log(l - x N 

Jo 



2 7n dx 

i r 1 r 1 i 



2 7o Jo x-y 
«oi + «io f 
2 /„ 
aoo + an Z" 1 1 



x(l - x)(j) t (x)ft{x)f t {y)dxdy 



+ aoi + oio /" _ . x (i _ x)<f>t{x)f t {x)dx 

£ . n X 



■ x(l - x)4> t (x)ft(x)dx 



2 y i-x 
i f 1 i 

= 2 y o 4>t{x) 2 x{l - x)f t {x)dx = -<p*(p(t) : q). 

In particular, this shows that Xproj(p(*)> is an increasing function of i G (0, oo). Moreover, 
since Theorem 12 . 1 1 gives — Xpmj(p(t), q) < ip*(p(t) : (?) for all i > 0, we have 

1 r 00 

Xproj(p(t),q)dt < - <p*(p(t) :q)dt = i*(p : q) < +oo 
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by ^3 Proposition lO.ll.c] so that limt_ +00 Xproj(p(i), q) = 0. Therefore, 

1 f°° r i 00 

i*{P-(l) = ^l <P*(p(t) ■ q)dt= Xproj(p(*),?) =-x P roj(p,g) 







2 

as long as x P roj(p,<?) = lim^o Xproj(p(*), ?) is valid 



3.3. A generalization of main theorem. Free analogs of logarithmic Sobolev inequality 
were shown in jHj for the single self-adjoint case and in [7j for the single unitary case. The 
inequality in Theorem 2.1 can be understood as such a free analog for two projections with 
respect to the trivial "hamiltonian" or trivial "potential function." Thus it is natural to make 
an attempt of generalizing it to the case where the given "hamiltonian" is non-trivial. In fact, 
our method based on the random matrix approximation still works for such an attempt. To 
do so, we need, on one hand, to give the definitions of "relative free entropy" and "relative 
free Fisher information" for pairs of projections, and on the other hand (as a technical side), 
to examine Bakry and Emery's r2-criterion by computing the Hessian of a certain trace 
function on G(N, k(N)) x G(N, l(N)). 

As mentioned in jSJ §3], the distribution of a general pair of projections can be understood 
as a tracial state on the C*-algebra 

A := { a (t) = [aij(t)] G C([0, 1]; M 2 (C)) : o(0),a(l) are diagonals} ^ C*(Z 2 *Z 2 ) 

with the canonical generators of two projections: 



p{t) :-- 



1 





and q(t) :- 



t y/t{l-t) 



V^T^t) 1-t 



The tracial state space of A is denoted by TS(A). An arbitrary r G TS(A) is uniquely 
determined by the representing data (u, { a ij}l,j=o) °f (P>?) * n ^ ne GNS representation of A 
with respect to r (see §§1.1); namely, 

r(a) = aioaii(O) + a i£i22(0) + auflu(l) + a oa22(l) + / tr 2 (a(i)) dv{t) 

Jo 

for every a G A, including (|1.4j) . We set Xproj( r ) := Xproj(Pi^) i n the GNS representation 
with respect to r. Furthermore, let h be a self-adjoint element in A which we consider as a 
general hamiltonian, and define a continuous function h(t) := Ti2(h(t)) on [0, 1]. Then, let us 
introduce the relative free entropy S/j(r) of r with respect to h in the following way: When 
c^ooOTi = c^oiOTo = 0, define 

Sft(r) := -Xproj(r) + r(h) + B h , 

where is the maximum of the entropy functional t' G TS(A) h- ► Xpto}(t') — t'(K) under 
the condition that r'(p) = r{p) and r'(q) = r(q). More concretely, 

~ 1 1 f 1 ~ 

Sft(r) = + - y (/i(x) - (aio + aoi) log x 

- (an + a o) log(l - x)) dv{x) + C ft , 

where Ch '■= C + Bh with C in ()1.6j) . Behind this definition is the large deviation principle 
of the empirical distribution of the random projection matrix pair A^r defined by replacing 
V>at in (|2.3j) by h; its proof is essentially same as in 0. Indeed, as (a) and (b) in the proof 
of Theorem 12.11 we have Ch = liniTv^oo 77s log for the normalization constant Z^, and 
S^(r) appears as the rate function, justifying the term "relative free entropy." 
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On the other hand, when h(t) is assumed to be a C^-function, let us define the relative 
free Fisher information &h( T ) of r with respect to h as follows: If the a^'s satisfy the same 
condition as above and moreover v has the density / := du/alx £ L 3 ((0, l),x(l — x)dx), then 

* fc (r) := [ {{Hf){x) + ^±^1 - ^i±^° - ^(l - x) dv(x); 

otherwise 3>/i(t) '■= +00. (Remark that the above integral is well-defined permitting +00 and 
3>/i(t) < +00 is equivalent to the condition p.8|) .) Note that when Xjf is given under the 
same assumptions (A) and (B) for r as in the proof of Theorem 12,11 ^(r) appears as the 
scaling limit of Dirichlet form: 

2 



lim 



iV-oo 2N 3 



G(N,k(N))xG{N,l{N)) 



dX 



N 



vbs lt (P ' g) 



dX^(P,Q). 

HS 



In the trivial case where h = 0, we have £/i(r) = — Xproj(/?, q) and ^(r) = <£>*(p : (?) for 
(p, q) in the GNS representation with respect to r so that the next proposition is a slight 
generalization of Theorem 12.11 

Proposition 3.1. Let h be a self-adjoint element in A. If h(t) is a C 2 -function on [0, 1] 
and it satisfies ciH/iHoo + C2||/i // ||oo < 1 for certain universal constants ci,C2 > 0, then the 
inequality 

^ (r) - \ — — ^r^ h{T) 

1 - cilln'Hoo - c 2 ||n"||oo 

holds for every r G TS(A). 

Sketch of Proof. The proof is essentially same as that of Theorem 2.1, and the only difference 
is in confirming Bakry and Emery's IVcriterion 1 for A^y under the assumption of the 
proposition. The criterion in this case says that one has a constant c > so that 

Ric(G(iV, k) x G(N, /)) + Hess(^Tv) > c • NI 2Kff _ k)+2 i(N-i), 

where Hess (\& at) stands for the Hessian of the trace function 

*n(P, Q) ■= NTr N (h(PQP)) for (P, Q) £ G(N, k) x G(N, I). 

Thanks to ()2.2|) we need only to estimate Hess(^7v) from below. One can explicitly compute 
Hessf^Tv) in terms of the normal coordinate (|2.16|) . which contains many terms of trace 
functions involving h'(PQP) and h"(PQP). A rough estimation of the formula shows that 
there are two universal constants c% , ci > so that 

HeSs(^Ar) > — AT(ci ll/t'Hoo + C2\\ti'\\oo)hk{N-k)+2l(N-V), 

while we do not know the best possible ci,C2- Now, the proposition follows from the proof 
of Theorem 12. II together with the above estimate. □ 

More details on computation of the Hessian Hess at) as well as the constants c±, C2 in the 
above proof can be found in [HI Remark 5.6]. 
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